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Content Acquisition Optimization 
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Yahoo Webmessenger 



• Update data sent to individuals logged into Yahoo’s Instant 
Messenger service online 

- Online contact status, unread emails in Yahoo inbox 

- Usually small sessions (2-4kB) 

• Sporadic collection (30,000 - 60,000 sessions per day) 

• Intermittent bursts of collection against contacts of targets 

- Large numbers of sessions (20,000+) against a single targeted selector 

- Not collected against the target (online presence/unread email from target) 

- No owner attribution (metadata value limited to fact-of comms for emails, 
online presence events for buddies) 

• Over a dozen selectors detasked in two weeks 

- Because a target’s contact was using/idling on Yahoo Webmessenger 

- Several very timely selectors (Libyan transition, Greek financial related) 
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Address Books 



• Email address books for most major webmail are collected as 
stand-alone sessions (no content present*) 

• Address books are repetitive, large, and metadata-rich 

• Data is stored multiple times (marina/mainway, pinwale, clouds) 

• Fewer and fewer address books attributable to users, targets 

• Address books account for ~ 22% of SSO’s major accesses (up 
from ~ 12% in August) 



'Access (10 Jan 12) 


Total Sessions 


Address Books 


" Provider 


Collected 


Attributed 


Attributed% 


US-3171 


1488453 


237067 (16% of traffic) 


Yahoo 


444743 


11009 


2.48% 


DS-200B 


938378 


311113 (33% of traffic) 


Hotmail 


105068 


1115 


1.06% 


US-3261 


94132 


2477 (3% of traffic) 


Gmail 


33697 


2350 


6.97% 


US-3145 


177663 


29336 (16% of traffic) 


Face book 


82857 


79437 


95.87% 


US-3180 


269794 


40409 (15% of traffic) 


Other 


22881 


1175 


5.14% 


US-3180 (16 Dec 11) 
TOTAL 


289318 

3257738 


91964 (32% of traffic) 
712366 (22% of traffic) 


TOTAL 


689246 


95086 


13.80% 



TOP SECRET//SI//NOFORN 





TOP SECRET//SI//NOFORN 




Buddy Lists, Inboxes 



• Unlike address books, frequently contain content data 

- Offline messages, buddy icon updates, other data included 

- Webmail inboxes increasingly include email content 

- Most collection is due to the presence of a target on a buddy list where the 
communication is not to, from, or about that target 



• NSA collects, on a representative day, ~ 500,000 buddylists and 
inboxes 

- More than 90% collected because tasked selectors identified only as 
contacts (not communicant, content, or owner) 

• Identifying buddylists and inboxes without content (or without 
useful content) an ongoing challenge 
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Scenario: 




@yahoo 



• | Sep 201 1^^^^^^^^^@yahoo.com (tasked S2E, asw 
Iran Quds Force) has his/her Yahoo account hacked by an 
unknown actor, sends out spam email to his/her contact list: 



DNI Parser Webmail Display lAHOOf @ MAIL 



Active user: 
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Scenario: 




@yahoo 



^^^1 has a number of Yahoo groups in his/her 

contact list, some with many hundreds or thousands of 
members 



• At DS-200B in particular, collection spiked as: 

- The initial spam messages were sent (and collected) 

- Inboxes of email recipients were viewed contact list 

- Messages were sometimes viewed, but more often sent as precached 
views on Google and Yahoo (along with inboxes) 



- Inboxes where the recipient did not delete the spam message continued to 
be collected every time they were viewed 



- Some recipients added 



@yahoo.com to their address books 



(possibly as a spam defeat?) - address books were collected every time 
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Scenario: 




@yahoo 



DS-200B Collection By Day - 11 Sep - 24 Sep (in MB) 
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DS-200B Collection By Hour - 18 Sep - 23 Sep (in MB) 
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Scenario: 




@yahoo 



^^^M@yahoo.com emergency detasked from DS-200B and 
US-3171 at 13:04Z on 20 Oct 



• Numerous first-order address books and inboxes collected 
meant task ed select ors on address books or buddy lists of 
contacts of ^^^J@yahoo.com also affected: 

ahoo.com and mail. com emergency 

detasked off US-3171 at 13:10Z on 20 Sep 

• Memorializing to PINWALE only address books and inboxes 
owned by target selectors would have reduced PINWALE 
volumes 90%+ 

- Site XKEYSCOREs would buffer data for SIGDEV purposes 

- Metadata from known owner address books and inboxes stored regardless 
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Mobile IMAP 



• IMAP protocol used by email clients 
to fetch mail from server(s) 

• Not designed for devices with 
intermittent connections (i.e. mobile 
phones) 

• Android implementation in 
particular uses a lot of bandwidth 



AO CAPAB ILITY 

Ai LOGIN 
A2 CAPABILITY 
A3 EXAMINE INBOX 
A4 LIST "" INBOX 
A5 LIST "" "INBOX. %" 

A6 SEARCH SINCE 15-Aug-2011 UNDELETED ALL 
A 7 FETCH 17 (ENVELOPE INTERNALDATE RFC822 . SIZE 
AS FETCH 17 (BODY. PEEK [HEADER] ) 

A9 CLOSE 
A10 LOGOUT 



Date 



Fri Augl 



From 



To 



Subject 

2nd Payment Reminder| 



Attachments 

0 



v Display Information: Email 




DNI Parser: Document or message has no data 



^ Send to ▼ 



Text Size 0 ffl View Full Screen t5P 
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Print Notes 



http://www.documentcloud.org/notes/print?docs[]=804763 



The NSA's overcollection problem 

9 Pages - Contributed by Matt DeLong, Washington Post - Oct 14, 2013 

The NSA's Special Source Operations branch manages "partnerships" in which U.S. and foreign telecommunications companies 
allow the NSA to use their facilities to intercept phone calls, emails and other data. This briefing describes problems with 
overcollection and NSA efforts to filter out what it does not need. 



What i; a "scssiorT? i;p. 2> 

— Lrsuairy amnii -MMeuiiia 

* Sporadic collection (30,000 - 60,000 sessions per day) 

4 — ji "ij .—(in. — ■ m ■ -j-jC- _ - fci -Jt:— — — — — i— — i — — — ■_« — _ j- 1 x 



11 Sekctors detasked" (p. £) 

ulliii lu |ull ti klii | rlhjrt. f luil m.-ut iwi bruuumiij 

* Over a dozen selectors delasked in two weeks 

- Because a target & contacl was using. 'id ling an Yahoo Wehmessenger 

- Severa I very timely selecto rs (Libyan I ran srtion , Greek fi nancia I related) 



M ARINAi'M AIN WAY 'PIN WA LE (p. 3) 

■ Dat a is stored mu tuple t i mes (M arina'm a i mway , p i nwa le , clouds) 



Attributable t,p. Sj 

* Fewer and fewer address books attributable to users, targets 



How many hooks art oolkotcd? 4p. ^ 
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Why collect "buddy lisls"? (p. 4) 



Buddy Lists, Inboxes 

* Unlike address books, frequently contain content data 

- Offline messages, buddy icon updates, ether data inducted 

- Web mail inboxes increasingly include email content 
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- Me si collection is due to the presence of a target on a buddy list where the 
commonicatioft is hot to, from., or about (hat target 



*-500.000 buddy lists and inboxes collected on a representative day (p. 4) 

* NSA collects, on a representative day, - 500,000 buddylists and 
inboxes 

- More than 90% collected because lashed selectors identified only as 
contacts (not communicant, content, or owner) 



A targeted account yecs hacked ip S> 
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Scenario: 




@yahoo 



* | Sep 20 1 gy ahoo com (tasked S2E, asw 
Iran Quds Force) has his/her Yahoo account hacked by an 
unknown actor, sends out spam email to his/her contact list: 
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Sparnm-ers complicate ■collection (p. 6) 
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Scenario: 




@yahoo 



■ ^^m@yahoo com has a number of Yahoo groups in his/her 
contact list, some with many hundreds or thousands of 
members 

■ At DS-2QQB in particular, collection spiked as: 

- The initial spam messages were sent (and collected) 

- Inboxes of email recipients were viewed by conlart list 

- Messages were sarrielimes viewed, bul metre oflen sent as precadwd 
views on Google and Yahoo (along wilh mboxesj 



- Inboxes where the recipient did not delete Ihe spam message continued lo 
be collected every time they were viewed 

- Some recipients added too cor to the if address books 

(possibly as a spam de^aP^address books were collected every time 



Targeted aecouril de tasked fp. id.) 




Scenario: 




@yahoo 



Cyatioo.com emergency detasked from DS-200B and 
JS^STTl at 13:04Z on 20 Ocl 

Numerous first-order address books and inboxes collected 
meant task ed selec tors on address books or buddy lists of 
contacts of also affected: 

yahso cat- com emergency 

31 71 at 13 1 0Z on 20 Sep 




• Memorializing to PINWALE only address books and inboxes 
owned by target selectors would have reduced PINWALE 
volumes 90%+ 



- Site XKEVSCOREb would buffer data lor SIGDEV purposes 

- Metadata from known owner address bosks and inboxes stored regardless 
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